An Alternative Method of Training Probabilistic LR Parsers
نویسندگان
چکیده
We discuss existing approaches to train LR parsers, which have been used for statistical resolution of structural ambiguity. These approaches are nonoptimal, in the sense that a collection of probability distributions cannot be obtained. In particular, some probability distributions expressible in terms of a context-free grammar cannot be expressed in terms of the LR parser constructed from that grammar, under the restrictions of the existing approaches to training of LR parsers. We present an alternative way of training that is provably optimal, and that allows all probability distributions expressible in the context-free grammar to be carried over to the LR parser. We also demonstrate empirically that this kind of training can be effectively applied on a large treebank.
منابع مشابه
Generalized Probabilistic LR Parsing of Natural Language (Corpora) with Unification-Based Grammars
We describe work toward the construction of a very wide-coverage probabilistic parsing system for natural language (NL), based on LR parsing techniques. The system is intended to rank the large number of syntactic analyses produced by NL grammars according to the frequency of occurrence of the individual rules deployed in each analysis. We discuss a fully automatic procedure for constructing an...
متن کاملThe Lane Table Method Of Constructing LR(1) Parsers
The lane-tracing algorithm is a reduced-space LR(1) parser generation algorithm. The previous version of lane-tracing algorithm regenerates states involved in reduce/reduce conflict by employing the practical general method. In this paper we describe an alternative lane-tracing approach, which regenerates states based on the lane table method. We discuss the details of this new algorithm, study...
متن کاملHead-Driven PCFGs with Latent-Head Statistics
Although state-of-the-art parsers for natural language are lexicalized, it was recently shown that an accurate unlexicalized parser for the Penn tree-bank can be simply read off a manually refined treebank. While lexicalized parsers often suffer from sparse data, manual mark-up is costly and largely based on individual linguistic intuition. Thus, across domains, languages, and tree-bank annotat...
متن کاملFaster Generalized LR Parsing
Tomita devised a method of generalized LR (GLR) parsing to parse ambiguous grammars e ciently. A GLR parser uses linear-time LR parsing techniques as long as possible, falling back on more expensive general techniques when necessary. Much research has addressed speeding up LR parsers. However, we argue that this previous work is not transferable to GLR parsers. Instead, we speed up LR parsers b...
متن کاملEine Rekonstruktion der LR-Theorie zur Elimination von Redundanz mit Anwendung auf den Bau von ELR-Parsern
In this thesis, we present work on two problems from the field of LR parser construction, a family of syntax analysis techniques for context-free languages. In the first part, we show that the traditional LR parser construction technique produces parsers which are burdened with a substantial amount of systematic redundance. We develop a new and well-founded method which defines what we call gen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004